已经使用基于物理学的模型对非全面车辆运动进行了广泛的研究。使用这些模型时,使用线性轮胎模型来解释车轮/接地相互作用时的通用方法,因此可能无法完全捕获各种环境下的非线性和复杂动力学。另一方面,神经网络模型已在该域中广泛使用,证明了功能强大的近似功能。但是,这些黑盒学习策略完全放弃了现有的知名物理知识。在本文中,我们无缝将深度学习与完全不同的物理模型相结合,以赋予神经网络具有可用的先验知识。所提出的模型比大边距的香草神经网络模型显示出更好的概括性能。我们还表明,我们的模型的潜在特征可以准确地表示侧向轮胎力,而无需进行任何其他训练。最后,我们使用从潜在特征得出的本体感受信息开发了一种风险感知的模型预测控制器。我们在未知摩擦下的两个自动驾驶任务中验证了我们的想法,表现优于基线控制框架。
translated by 谷歌翻译
神经网络已越来越多地用于模型预测控制器(MPC)来控制非线性动态系统。但是,MPC仍然提出一个问题,即可实现的更新率不足以应对模型不确定性和外部干扰。在本文中,我们提出了一种新颖的控制方案,该方案可以使用MPC的神经网络动力学设计最佳的跟踪控制器,从而使任何现有基于模型的Feedforward Controller的插件扩展程序都可以应用于插件。我们还描述了我们的方法如何处理包含历史信息的神经网络,该信息不遵循一般的动态形式。该方法通过其在外部干扰的经典控制基准中的性能进行评估。我们还扩展了控制框架,以应用于具有未知摩擦的积极自主驾驶任务。在所有实验中,我们的方法的表现都优于比较的方法。我们的控制器还显示出低控制的水平,表明我们的反馈控制器不会干扰MPC的最佳命令。
translated by 谷歌翻译
这里,我们提出了一种新方法,在没有任何额外的平滑算法的模型预测路径积分控制(MPPI)任务中产生平滑控制序列。我们的方法有效地减轻了抽样中的喋喋不休,而MPPI的信息定位仍然是相同的。我们展示了具有不同算法的定量评估的挑战性自主驾驶任务中的提出方法。还提出了一种用于估算不同道路摩擦条件下的系统动态的神经网络车辆模型。我们的视频可以找到:\ url {https://youtu.be/o3nmi0ujfqg}。
translated by 谷歌翻译
已经研究了预测听众平均意见评分(MOS)的自动方法,以确保文本到语音系统的质量。许多先前的研究都集中在建筑进步(例如MBNET,LDNET等)上,以更有效的方式捕获光谱特征和MOS之间的关系,并获得了高精度。但是,从概括能力方面的最佳表示仍在很大程度上仍然未知。为此,我们比较了WAV2VEC框架获得的自我监督学习(SSL)特征与光谱特征(例如光谱图和Melspectrogron的幅度)的性能。此外,我们建议将SSL功能和功能结合起来,我们认为我们认为将基本信息保留到自动MOS上,以相互补偿其缺点。我们对从过去的暴风雪和语音转换挑战中收集的大规模听力测试语料库进行了全面的实验。我们发现,即使给定的地面真相并不总是可靠,WAV2VEC功能集也显示出最佳的概括。此外,我们发现组合表现最好,并分析了它们如何弥合光谱和WAV2VEC特征集之间的差距。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
Cellular automata (CA) captivate researchers due to teh emergent, complex individualized behavior that simple global rules of interaction enact. Recent advances in the field have combined CA with convolutional neural networks to achieve self-regenerating images. This new branch of CA is called neural cellular automata [1]. The goal of this project is to use the idea of idea of neural cellular automata to grow prediction machines. We place many different convolutional neural networks in a grid. Each conv net cell outputs a prediction of what the next state will be, and minimizes predictive error. Cells received their neighbors' colors and fitnesses as input. Each cell's fitness score described how accurate its predictions were. Cells could also move to explore their environment and some stochasticity was applied to movement.
translated by 谷歌翻译
There is a dramatic shortage of skilled labor for modern vineyards. The Vinum project is developing a mobile robotic solution to autonomously navigate through vineyards for winter grapevine pruning. This necessitates an autonomous navigation stack for the robot pruning a vineyard. The Vinum project is using the quadruped robot HyQReal. This paper introduces an architecture for a quadruped robot to autonomously move through a vineyard by identifying and approaching grapevines for pruning. The higher level control is a state machine switching between searching for destination positions, autonomously navigating towards those locations, and stopping for the robot to complete a task. The destination points are determined by identifying grapevine trunks using instance segmentation from a Mask Region-Based Convolutional Neural Network (Mask-RCNN). These detections are sent through a filter to avoid redundancy and remove noisy detections. The combination of these features is the basis for the proposed architecture.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
In this paper, we learn a diffusion model to generate 3D data on a scene-scale. Specifically, our model crafts a 3D scene consisting of multiple objects, while recent diffusion research has focused on a single object. To realize our goal, we represent a scene with discrete class labels, i.e., categorical distribution, to assign multiple objects into semantic categories. Thus, we extend discrete diffusion models to learn scene-scale categorical distributions. In addition, we validate that a latent diffusion model can reduce computation costs for training and deploying. To the best of our knowledge, our work is the first to apply discrete and latent diffusion for 3D categorical data on a scene-scale. We further propose to perform semantic scene completion (SSC) by learning a conditional distribution using our diffusion model, where the condition is a partial observation in a sparse point cloud. In experiments, we empirically show that our diffusion models not only generate reasonable scenes, but also perform the scene completion task better than a discriminative model. Our code and models are available at https://github.com/zoomin-lee/scene-scale-diffusion
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译